home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The PC-SIG Library 10
/
The PC-Sig Library - Shareware for the IBM PC and Compatibles (PC-SIG)(Tenth Edition Disks 1-2804)(1991).iso
/
PC_SIGCD
/
19
/
5
/
DISK1959.ZIP
/
ACSORT.DOC
next >
Wrap
Text File
|
1989-07-23
|
19KB
|
385 lines
INTRODUCTION
ACSORT, copyright (c) 1989, T.N.T. Software Inc., is a shareware
program; you may try it without any obligation to pay anything. If you
do like it and use it after trying it, you should register your copy.
Registration will cost $25. Registered users are entitled to support,
will be sent source code on request, and will be notified of updates.
If you are not a registered user, you should not expect support; please
do not call or write and ask for any.
You may post ACSORT and this documentation on bulletin boards, pass
it around to your friends, and make copies of it. You may not sell
ACSORT without written permission from T.N.T. Software.
ACSORT (Advanced Cheapsort) was written by me, Bruce W. Tonkin.
About 90% of the program is Quick BASIC. The remainder is assembler.
The assembler routines were printed in Dr. Dobbs' Magazine as a part of
several articles I wrote about general sorting techniques.
CAPABILITIES OF ACSORT
ACSORT is a disk-based sort/merge program. It will sort data files
which have fixed-length records and fixed-length fields. The data file
being sorted may be of any reasonable size, up to 268,402,689 records
long and up to 2,147,483,648 bytes (but sorting that many records would
probably take about six months on a fast machine). You may have up to
50 sort fields, provided that the total length of all sort keys does not
exceed 252 bytes. If you need even longer keys, contact T.N.T.
Software; the modification for that is rather trivial, though it's not
likely to be needed.
The output file will be a list of record numbers (an index to the
data file in sorted order). The index itself can be read as either a
sequential file or a random file with a record length of 12. To save
disk space and increase sort speed, ACSORT will not output a whole
sorted file; if you need such output, see the sample programs at the end
of this documentation. The sample programs also show how to use the
index to read the data file in sorted order.
ACSORT will not sort sequential data files, such as those written
using a word processor; it is only intended for use with true random
files. For that reason, it will NOT sort dBase data files. dBase files
are not random files: the first part of the file is a variable-length
header and is not part of the data nor is it the same length as any of
the data records. ACSORT could be modified to read dBase files, at the
sacrifice of a fair amount of speed and efficiency (which is part of the
reason ACSORT is faster than dBase sorts, and Quick BASIC data base
programs are much faster than dBase applications).
ACSORT works correctly under PC DOS 2.0 and higher, and on hard
disks, floppy disks, and RAM (or memory) disks. ACSORT is not copy
protected in any way, and may be backed up or transferred freely to
other media. ACSORT is the copyrighted property of TNT Software Inc.,
and may not be sold without written permission of TNT Software.
You may sort data of any type commonly used in any Microsoft
language. Valid types include ASCII data (numbers or letters, in
alphabetic order), byte, packed integer or long integer, packed dates
(MDY format with each stored as a single character), dates in a variety
of other formats, packed single precision floating-point, packed double
precision, and any user-defined data type adhering to either the
Microsoft or IEEE floating point format. Character or byte data may
contain control characters (of course, so may any of the other types).
Many other sorts have difficulty with other than ASCII data.
ACSORT is compatible with THE CREATOR, REPORTOR, PROGEN, REPGEN, and
other software sold or distributed by TNT SOFTWARE, INC. For that
reason, any record composed entirely of ASCII character 250 in every
byte of every field will be skipped (counted as a deleted record).
1
ACSORT is totally compatible with Microsoft BASIC random-access
files. Other sorts may have some difficulties with Microsoft random
files, even appending unwanted control characters or additional records
or pointers to the file output. ACSORT will not.
You may sort unpacked numbers in numeric (rather than alphabetic)
order by choosing to sort that field as NUMERIC, rather than CHARACTER.
ACSORT will convert any such numbers into a packed double-precision
format internally; the sort will be accurate to 16 significant digits.
CSORT will:
(1) Read the command line for the name of the parameter file.
(2) Support subdirectories and path names for all file names.
(3) Buffer records internally for added speed.
(4) Allow records to be selected for or excluded from the sort.
ACSORT will only output an INDEX to the file in sorted order. This
index will consist of a list of record numbers. By reading the sorted
file in the order specified by the index, you will be reading the file
in the sorted order you specified.
For your added convenience, the sort index file can be read as a
random file of record length 12. This will allow you to use the index
file to do binary searches of the data, if you wish. To help you
further, ACSORT will always tell you exactly how many records were
sorted when the sort has been completed.
SETTING UP THE PARAMETER FILE
ACSORT can read all the requested inputs from the parameter file.
You may also enter all the inputs (except select and exclude criteria)
manually, at the keyboard. For illustration purposes, we'll call the
parameter file CSORT.DAT, which is assumed to be on the current drive
and in the current directory. You may name the parameter file anything
you like and put it on any drive and in any directory you like.
All inputs should be on separate lines. Multiple inputs on the same
line will usually cause errors of various types.
If an error is encountered while reading the CSORT.DAT file, ACSORT
will print a short descriptive error message: 'BAD INPUT FILE', 'BAD
OUTPUT FILE', etc. Those error messages should enable you to find and
fix the error fairly easily.
When running the sort, the only things the user will generally see
on the screen are the sort copyright message, whatever messages are
appropriate ('sorting through record #', 'merging', and the sort
termination message.
The parameter file should contain the following items, on separate
lines (without line numbers) in the order specified. You can create
this file with a word processor or text editor, if it will produce a
plain ASCII file, or use the same procedure as you would use to create a
BATCH FILE from DOS.
1. Name of the file to sort;
2. Record length for sort file;
3. Name of index file;
4. Starting position for key (if zero go to step 9);
5. Length of key;
6. Data type for key (C=character or packed half-precision,
N=numeric, I=packed integer, O=old packed field type, F=IEEE-type
floating-point, L=Long integer, D=3-byte packed date, X=Date as
character string with year at end, Y=IEEE Floating-point single-
precision number containing date, Z=IEEE Floating-point double-
precision number containing date, y=Microsoft format single-
precision number containing date, z=Microsoft format double-
precision number containing date.);
2
7. Ascending or descending order (A/D);
8. Go back and repeat steps 4-7 until starting position is 0;
9. Work drive (A-Z is allowed, but you should NOT specify a
subdirectory);
10. Selection or exclusion criteria.
All date fields may be in either MMDDYY or MMDDYYYY format except
packed date (D). Packed date fields are assumed to have the month
stored as one byte, the day as one byte, and the year as one byte, in
that order. Dates stored as character strings may use any delimiter or
no delimiter at all between day, month, and year--so long as the
delimiter is the same in each record. However, the day field should
always be the same size for each record. ACSORT will handle dates like
12/13/46 and 4/15/87 by padding the second date with a 0 on the left.
It will not sort dates like 7/4/89 in correct order because the day is
only one byte long (and determining how to fix dates like 12-7-88 and
4/14/77 would take too long to be worthwhile).
The beginning position of each key must be carefully specified. The
easiest way to understand how this should be done is with an example.
Suppose you want to sort a data file whose record length is 85; the
fields in each record are of lengths 15, 20, 5, 20, 10, 10, and 5.
Suppose you want to sort each record by field 4, then 3, then 6, then 1.
In that case, the starting position of the first key (field 4) would
be 15+20+5+20=60, plus 1 (skip the first 60 characters, start the sort
at the 61st character). The key length would be 20, if you want to sort
by all of the fourth field. Likewise, the second key begins at
15+20+1=36, the third key begins at 15+20+5+20+10+1=71, and the last key
begins at position 1.
You need not use all of a key field as a sort key. If your key
field is very long (a name field, with 80 characters allowed, for
example), you may wish to use only the first few characters of the field
as the key. Just enter whatever length you decide to use.
You may use any drive for your work files. If you're sorting a
small-enough data file (under 16000 records) and you have a small enough
total key length (depends on memory available), you may not actually
need any work file. There will always be two work files: $$$.TMP and
$$$$.TMP. The second one is a dummy file used for internal buffering
and will not be used to actually write any data to disk.
In any event, the work file space necessary will not exceed your
total key lengths plus four bytes, rounded to the nearest higher power
of two (if not a power of two already), times the number of records in
the data file. For example, if the total of key lengths is 50 bytes and
you are sorting 2000 records, the space required for your work file (if
needed) will be 64*2000=128000 bytes. If the total length of all keys
is 28 bytes, the space required for work files (if any are needed) will
be 32*2000=64000 bytes.
When ACSORT begins to run, it will calculate how much memory is
available for the sort and display that number at the top of the screen.
Normally, on a 640K MSDOS machine there will be 300K to 400K bytes
available. ACSORT uses much of the remaining memory for internal file
buffers and its own program code. ACSORT will not hold more than 16,383
records internally, regardless of key length and memory available. So,
it's possible that ACSORT might use as little as 128K for actual sort
space.
The output index will always take exactly 12 bytes per record
sorted. The output index file will always be written before any work
files are deleted.
Be sure you have enough space available on the drives you designate
for your work and index files. It is disheartening to run the sort
almost to completion and get a 'disk full' error message.
3
It is interesting to note, in this connection, that ACSORT uses
generally less work file space than other sorts. You will probably find
that ACSORT will be capable of sorting data files other sorts cannot
touch.
RECORD SELECTION AND EXCLUSION
You may select or exclude records by including command lines of the
form:
S,start,length,kind,relationship,value
X,start,length,kind,relationship,value
Lines beginning with the letter S are used to select records. Lines
beginning with the letter X are used to exclude records from the sort.
You may enter selection and exclusion criteria in any order, but
exclusions will always be processed first.
The "start" parameter must be a number and should be the beginning
position of the part of the record used for selection or exclusion. The
"length" parameter should be the number of characters to use. The
"kind" parameter must be one of the letters I, O, N, C, F, L, or D. The
letter I will be used to indicate a packed integer field; O will
indicate an old-style floating-point packed number (not IEEE format); N
indicates an unpacked numeric field; C is a character field; F is an
IEEE-format packed floating-point field; L is a packed long integer
field (4 bytes); and D is a packed date (3 characters, MDY format).
The relationship must be one of "<", ">", "<=", ">=", "<>", or "=".
The value should be the character string or numeric value against which
the field should be compared.
Here are some examples of selection and exclusion:
S,3,4,F,<,34.56
Select only records for which the field starting at position 3 and going
for four bytes, when treated as an IEEE floating-point number, is less
than 34.56.
X,57,10,C,>=,"FLOSTERMAN"
Exclude records for which the field starting at position 57 and going
for ten bytes is greater than or equal to "FLOSTERMAN".
HOW TO RUN ACSORT
Simple: while at the DOS prompt (A>, B>, etc.) just type in the name
of the ACSORT file you have decided to use, followed by the name of the
parameter file. Use any drive or subdirectory specifiers necessary.
For example:
ACSORT B:CSORT.PAR
or
C:\DOS\ACSORT B:\DATAFILE\CSORT.PAR
Below are two sample parameter files you can use as templates for
your own sorting needs. For each file, I have included an explanation
of the actual command lines appearing to the left.
SAMPLE PARAMETER FILES
{Lines in the file} {Explanation}
CUSTLIST.DAT Name of file to sort
138 Record length
CUSTLIST.INX Name of output index file
8 Starting position of first key
4
25 Length of key
C Data type (capitalization counts!)
A Ascending order
0 Starting position of next key (0 for none)
<-------- note: if this line is left blank, the work drive will
default to the current drive. If this line does not
appear, ACSORT will ask the operator at run-time for
the letter of the work file drive.
CUSTLIST.DAT Name of file to sort
138 Record length
CUSTLIST.INX Name of index file
8 Starting position of first key
25 Length of first key
C Data type of first key
A Ascending order
4 Starting position of second key
4 Length of second key
O Data type of second key
D Descending order
0 Starting position of third key (0 to stop)
C Drive for work files is C
X,4,4,O,<=,100 Exclude if field starting at 4 and going for 4
bytes, considered as old-style floating-point
number, is less than or equal to 100.
SORT PERFORMANCE
The following speeds have been observed, running ACSORT on a 40
Megabyte Plus Development HardCard on a Tandy 4000 (80386, 16 MHz).
Times on a floppy disk will be longer and will depend on the drive type,
the media quality, and a number of other factors. Times on better hard
disks will be less.
SORTING 24765 RECORDS OF 138 BYTES EACH
Key Length Type Order Time (sec) Records/minute
12 Alpha Ascending 225.0 6603
12 Alpha Descending 224.4 6622
25 Alpha Ascending 228.7 6497
25 Alpha Descending 230.0 6460
25 Alpha Ascending 52.1 28511*
4 Float Ascending 226.1 6573
4 Float Descending 222.2 6687
*565 records were extracted via "exclude" from 24,765 and actually
indexed.
SORTING 2087 RECORDS OF 131 BYTES EACH
Key Length Type Order Time (sec) Records/minute
25 Alpha Ascending 11.1 11287
25 Alpha Descending 11.2 11181
5
SAMPLE GWBASIC PROGRAM TO READ A DATA FILE VIA AN INDEX
AND CREATE A WHOLE FILE OUTPUT
10 LINE INPUT"File to read:";F$
20 LINE INPUT"File to write:";W$
30 LINE INPUT"Index file name:";I$
40 INPUT"Record length:";R:'If record length >255 you will need to alter
50 OPEN"R",1,F$,R:' lines 60 and 80 for a more complicated
60 FIELD #1,R AS A$:' field statement. use as many variables as needed
70 OPEN"R",2,W$,R:' to use up all the characters in each record.
80 FIELD #2,R AS B$:' For example:
90 OPEN"I",3,I$:' FIELD #1, 255 AS A1$,255 AS A2$,255 AS A3$
91 ' Will handle a record of 765 bytes. you will
92 ' need to field #2 similarly, and do the LSETs in line
93 ' 110 as well. The LSETs can be done like this:
94 ' LSET B1$=A1$:LSET B2$=A2$:LSET B3$=A3$
95 ' Further note: if your record length exceeds 128, you
96 ' should be sure to enter basic with the /s: switch
97 ' set appropriately. See your basic manual.
100 PRINT"Transferring":ON ERROR GOTO 500
110 INPUT #3,A:GET 1,A:LSET B$=A$:COUNT=COUNT+1:PUT 2,COUNT
120 GOTO 110
500 PRINT"Done.";COUNT;"Records transferred."
510 CLOSE:END
SAMPLE QUICKBASIC PROGRAM TO READ A DATA FILE VIA AN INDEX
AND CREATE A WHOLE FILE OUTPUT
DEFLNG A-Z
LINE INPUT"File to read:";f$
LINE INPUT"File to write:";w$
LINE INPUT"Index file name:";i$
INPUT"Record length:";r
OPEN"R",1,f$,r: FIELD #1,r as a$
OPEN"R",2,w$,r: FIELD #2,r as b$
OPEN"I",3,i$
PRINT"Transferring"
while not eof(3)
INPUT #3,a:GET 1,a:LSET b$=a$:count=count+1:PUT 2,count
wend
PRINT"Done.";count;"Records transferred."
CLOSE:END
THANKS FOR PURCHASING ACSORT!
For a catalog of our other inexpensive, high-quality software for the
IBM PC or compatibles, write to:
T.N.T. SOFTWARE, INC.
34069 HAINESVILLE ROAD
ROUND LAKE, IL 60073
(312) 223-0832
IBM PC is a registered trademark of The IBM Corporation; THE CREATOR,
REPORTOR, PROGEN, REPGEN, and ACSORT are trademarks of T.N.T. SOFTWARE,
INC.
6